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Abstract 

We provide a systematic approach to deal with the following problem. Let 
Xi, ..., Xn be, possibly dependent, [0, l]-valued random variables. What is 
a sharp upper bound on the probability that their sum is significantly larger 
than their mean? In the case of independent random variables, a fundamen¬ 
tal tool for bounding such probabilities is devised by Wassily Hoeffding. In 
this paper we consider analogues of Hoeffding's result for sums of dependent 
random variables for which we have certain information on their dependency 
structure. We prove a result that yields concentration inequalities for sev¬ 
eral notions of weak dependence between random variables. Additionally, 
we obtain a new concentration inequality for sums of, possibly dependent, 

[ 0 , l]-valued random variables, Xi,. .., Xn, that satisfy the following condi¬ 
tion: there exist constants 7 G (0,1) and 6 G (0,1] such that for every sub¬ 
set A C n} we have E [JJjg,! Xi - Xi)] < , where 

|A| denotes the cardinality of A. Our approach applies to several sums of 
weakly dependent random variables such as sums of martingale difference 
sequences, sums of k-wise independent random variables and [/-statistics. 
Finally, we discuss some applications to the theory of random graphs. 
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1 Prologue, related work and main results 


1.1 Covariance estimates 


The main purpose of this work is to obtain extensions of Hoeffding's inequality to 
sums of weakly dependent random variables. In order to emphasize the analogy 
between existing and posterior results, let us begin right away by stating Hoeftd- 
ing's well-known theorem (see [18]], Theorem 1). Throughout the text, E[-] and P[-] 
will denote expectation and probability, respectively 


Theorem 1.1 (Hoeffding, 1963). Let Xi,... be independent random variables 
such that 0 < Xi < I, for each i = 1,... ,n. Set p = L E[Xj] and fix a real number 

efrom the interval (o,^ — iyift = np + npe then 


P 


Y.x,>t 

i=l 


< inf - p + pe^Y . 


Furthermore, 


inf e (l — p + pe'^Y 


pfi-pT-^ 



n 


n — t 


n 


H{n,p,t) 


ctnd 

H{n,p,t) = e-"^(P+P"llp), 

where for q,p e (0,1), D{q\\p) = q\n^ + {1 — q) In ^ is the Kullback-Leibler distance 
between q and p. 

The function H{n, p, t) is the so-called Hoeffding function. The estimate 
i.e. the Hoeffding function expressed in terms of the Kullback-Leibler distance, is 
referred to as the Chernoff-Hoejfding bound. In other words, Hoeffding's result pro¬ 
vides an upper bound on the probability that a sum of independent and bounded 
random variables is significantly larger than its expected value. We remark that 
a foolproof version of the bound can be obtained using the standard estimate on 
the Kullback-Leibler distance: D{q\\p) > 2{q — pY, for G (0,1) such that q > p. 
Hoeffding's inequality is a folklore result that has been proven to be useful in a 
plethora of problems in combinatorics, probability, statistics and theoretical com¬ 
puter science. However, there are several instances in which one is dealing with 
sums of bounded random variables that are not independent; as an example the 
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reader may think of the number of triangles in an Erdos-Renyi random graph. 
Such instances have been encountered by several authors for a variety of ques¬ 
tions which, in succession, gave rise to the problem of obtaining analogues of 
Theorem [id] for sums of dependent random variables, under certain assumptions 
on their dependency-structure. The amount of literature that treats the problem 
of exfending Hoeffding's theorem to sums of dependenf random variables is vast 
and the interested reader is invited to take a look at the works of Azuma ||3]|, Ben- 
tkus |[6]|, Delyon ||Z||, Fan ef al. KTOl . Gavinsky ef al. HTTI . Gradwohl ef al. l[T4l . Hazla 
ef al. lil^ , Impagliazzo et al. ||T9||, Janson |[20ll , Kallabis et al. Il23]l , Kontorovich et 
al. Il25l , Linial ef al. Il26l , McDiarmid fZ7\ . Ramon et al. liSTI , Rio Il32l , Schmidt et 
al. |[33]1 , Siegel Il34l , Van de Geer 1351 , Vu Il36ll , among others. Let us also remark 
that certain assumptions of "weak dependence" between the random variables 
are required in order to make the problem interesting. If the random variables 
are fully dependent then the problem is trivial; just let Xi = ■ ■ • = = - with 

probability \ Xi = ■ ■ ■ = Xn = 0 with probability 1 — 

Then, for t > ^. EjXj], Markov's inequality implies that P [^. X^ > t] < 
and the later collection of random variables attains this bound. This article may be 
regarded as an addendum to the aforementioned amount of literature; we prove a 
result that can be employed in order to obtain concentration inequalities for sums 
of dependent random variables for which we have cerfain information on their de¬ 
pendency structure. 

The exposition of our paper proceeds as follows. In the remaining part of the cur¬ 
rent section we formalise a particular type of "dependency-structure" between 
bounded random variables and juxtapose existing bounds on the probability that 
their sum is larger than their mean with bounds obtained via our approach. There 
are several ways to describe a dependency structure between random variables, 
some of which will be discussed in the following subsections. Let us begin with a 
rather general description that assumes estimates on the "covariance structure" of 
the random variables and is contained in the following theorem, due to Impagli¬ 
azzo and Kabanets HT9B . Here and later, for a positive integer n, we will denote by 
[n] the set n}. 

Theorem 1.2 (Impagliazzo & Kabanets, 2010). There exists a universal constant c > 1 
satisfying the following. Suppose that Xi, ..., Xn are random variables such that 0 < 
Xi < I, for i = 1,..., n. Assume further that there exists constant 7 G (0,1) such that 
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for all A C [n] the following condition holds true: 


E l [ X , <71^1, 


jeA 


where l^l denotes the cardinality of A. Fix a real number e from the interval ^0, ^ — 1 
and set t = ny + nye. Then 


n 


P Xi >t < ce 


■nD(7(l+£)||7) 


i=l 


where ^(7(1 + e) 1 17) is the Kullback-Leibler distance between 7(1 + e) and 7. 

Throughout the text, the empty product is interpreted as 1 . See IIT91 for a neat 
proof of the previous result as well as for applications fo direct products and ex¬ 
pander graphs, among others. In the case of Bernoulli 0/1 random variables it is 
shown in ||I9| , Theorem 3.1, that the constant c in the previous theorem is equal 
to 1; however the exact value of c does not seem to be known in the case of gen¬ 
eral [0, l]-valued random variables. Moreover, in the case of Bernoulli 0/1 random 
variables, the following refinement upon Theorem [T2]has been obtained by Linial 
and Luria flR . 

Theorem 1.3 (Linial & Luria, 2014). Let Xi,... ,Xnbe Bernoulli 0/1 random variables. 
Let jd G (0,1) be such that fdn is a positive integer and let k be any positive integer such 
that 0 < k < /3n. Then 



See 12^ for a very elementary proof of this result. Notice that the previous result 
reduces to Markov's inequality when k = 1. It can be seen, using standard en¬ 
tropy estimates of binomials, that Theorem [L3] reduces to Theorem [L2] in case one 
makes the additional assumtion E [IlieA Xi\ < for all Ac [n]. We provide 
two proofs of Theorem 11.31 in Section |2l The first proof is based upon the main 
result of our paper which provides a concentration bound, for sums of random 
variables, expressed in terms of expectations with respect to convex functions. 
More precisely, a basic ingredient in the proof of mosf resulfs in this paper is the 
following theorem. Here and later, we will denote by dj [n] the family consisting 
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of all subsets of [n] whose cardinality equals j G {0,1. 


Theorem 1.4. Let Xi,... he random variables such that 0 < X, < IJort = 
1,..., n. For every subset A C [n], define the random variable, Za, by setting 


Za = IIx, J] (1 -x,) 


ieA i£ n]\A 


Let T be the set consisting of all functions / : M —>■ [0, +cx)) that are increasing and 
convex and set p := - If t is a real number such that np < t < n, then 

E [Za] = 1 and 



where Z is the random variable that takes values in the set (0,1,..., n} with probability 


^Z = j]= E[ZA],forj = 0,l,...,n. 


A&dj [n] 


Let us remark that the assumption t > np, in the previous theorem, is essential. 
Indeed, a first step in proof of the previous theorem is an application of Markov's 
inequality: 



Since / is assumed to be convex, Jensen's inequality implies E[/(Er=i^*)] > 
f{np). Since / is additionally assumed to be increasing we have f{t) < f{np), 
for t < np, and so the aforementioned application of Markov's inequality caimot 
yield a useful estimate. If t < np, then the previous theorem gives a useful upper 
bound on the probability P [EILi Xi <t\ = P [n — Xi>n — t]; so we may 
choose to work with the upper tail. We prove Theorem [L4]in Section |2l The proof 
makes use of an elementary result (Lemma |2d] below) that allows one to write a 
sum of n real numbers from the interval [0,1] as a convex combination of the set 
of integers (0,1 ,.We also show that, in the case of independent random vari¬ 
ables, Theorem [L4] reduces to Hoeffding's Theorem ll.il It turns out that Theorem 
ll.4l can be employed in order to obtain concentration inequalities for several sums 
of weakly dependent random variables such as martingale difference sequences, 
k-wise independent random variables and sums of Bernoulli 0/1 random vari¬ 
ables whose dependency structure is given in terms of a graph. We illustrate this 
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in the following subsections. Let us begin with a consequence of Theorem [L4] that 
may be seen as an generalisation of Hoeffding's Theorem ll.li 


Theorem 1.5. Suppose that Xi,..., are random variables such that 0 < Xj < 1,/or 
i = 1,... ,n. Assume further that there exist constants 7 e (0,1) and 6 G (0,1] such that 
for all A C [n] the following condition holds true: 

E [Za] < 7 '^' • z^here Za = ]Jx, J] (1 - X/ 

i^A n]\A 


and \ A\ denotes the cardinality of A. Fix a real number efrom the interval ^0, 7 “ 1 j zmd 
set t = ny + nye. Then 


P 


E 

2 = 1 


Xi > t 


< rs 


t s:n—t 


n — t 


n 


n — t 


Furthermore, 

^ ~ ^ ^ ^ < g-^{'D{7(l+£)ll7)-(l-7(l+£))ln 


where 11 ( 7(1 + e) 1 17 ) denotes the Kullback-Leibler distance between 7(1 + 6 ) and 7 . 

We prove this result in Section |2l In other words, the previous result adjusts the 
factor (1 — 7(1 + 6)) In to the Chernoff-Hoeffding bound in retaliation for the 
fact that the random variables were not assumed to be independent. We remark 
that we always have 7 + <5 > 1. To see this notice that, since = 1/ the 

condition of the previous theorem implies 


n n / \ 

3=0 A&dj[n\ 3=0 ^ 

Notice also that the factor (1 — 7(1 + 5)) In j/- is not very large (for example, it less 
than (1 — 7 ) In j/y < 1 ) and that the bound of Theorem 11.51 involves no unknown 
constants. 


Remark 1.6. Theorem fL5\ should be considered as complementary to Theorem \F2[ in the 
sense that it may be applicable when an estimate of the form E zs not 

available and, instead, an estimate of the form E [Za] < is available. Let us 
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also remark that an estimate of the former form cannot be concluded from an estimate of 
the later form and so one cannot conclude Theorem \L^ as a consequence of Theorem \1.2\ 
To be more precise, let us look at the case of Bernoulli 0/1 random variables. In that case 
the constant c in Theorem 1172] equals 1 (see IfWil . Theorem 3.1). Under the assumption 
E [Za] < 7 ^"^' • all A C [n], and since the random variables are Bernoulli we 

have 


E 


J£A 


= T, 

T-.ACT 


7IAI + 5)n-|A| 


and so Theorem \T5\ is dealing with an estimate on E [HjeA is, for fixed 7, larger 

than the corresponding estimate in Theorem \T2\ 


The second proof of Theorem 11.31 is obtained using a coupling argument. In fact, 
we prove a bit more. 


Theorem 1.7. Let Xi,... ,Xnbe Bernoulli 0/1 random variables. Let (3 G (0,1) be such 
that fdn is a positive integer and let k be any positive integer such that 0 < k < jdn. Then 


n ^ " 

^dri) A-.\A\=l3n 


Ux. 

i£A 


< P 


y^^Xi>/3n 


i=l 


^ ( 7 ) 


A-.\A\=k 


IIv 

jga 


In Section |2] we provide two proofs of the upper bound. Let us remark that the 
second proof is basically a paraphrasis, in probabilistic language, of the combi¬ 
natorial proof from 1261. Our proof is longer and rather uglier but reveals a way 
to think of a lower bound. Let us also remark that Theorem 11.21 and Theorem 11.31 
may be employed in order to obtain concentration bounds for particular sums of 
dependent indicators that are encountered in the theory of Erdos-Renyi random 
graphs; we illustrate this in Section [71 Moreover, we obtain the following result 
that is related to Theorem II■2[ 


Theorem 1.8. Suppose that Xi,..., X^ are random variables such that 0 < X^ < I, for 
i = 1 ,. .., n. Set p = ^ [^i] ^^dfix a real number t such that np + 1 < t < n. If 

£0 > 0 is such that t — 1 = np + npso, then 


P 


. ^=1 
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where D{p{l + £o) I \p) is the Kullback-Leibler distance between p(l + £o) (ind p. 

We prove this result in Section |2l Notice that the constant c of Theorem 11.21 has 
been replaced by 2 , but the parameter Eq is smaller than the corresponding pa¬ 
rameter c in Theorem II.21 which results to a slightly larger exponential bound. 

It turns out that Theorem 11.41 applies to sums of martingale difference sequences. 
This is the content of the following subsection. 


1.2 Martingales 

Martingales are sequences of random variables thaf exhibif a rather simple de¬ 
pendency structure. More precisely, a sequence Xo,Xi... of integrable random 
variables is called a martingale if 

W.[Xn+i\Aj^ = Xn, for all n > 0 , 

where An is the cr-algebra generated by the random variables Xq, Xi,..., X„. A 
sequence W, W • • • of integrable random variables is called a martingale difference 
sequence if 

E[X„|Xn-i] = 0 , for all n > 1 , 

where Xn-i is the a-algebra generated by the random variables Yi,..., Yn-i and 
Xo is the trivial a-algebra. Given a martingale Xq, Xi,..., one can obtain a martin¬ 
gale difference sequence by setting W = X^ — Xfc_i, fc = 1 , 2,..., and, conversely, 
given Xq and a martingale difference sequence Yi, 1 ^ 2 , • • •, one can obfain a mar¬ 
tingale by setting X^ = Xq -I- Therefore, one may choose fo work with 

either sequence. Theorem 11.41 allows to prove a refined version of a well-known 
result, due to McDiarmid Il27l . that provides a concentration inequality for sums 
of martingale difference sequences. McDiarmid's inequalify has been proven fo 
be useful in several questions in combinatorics and probability and reads as fol¬ 
lows. 


Theorem 1.9 (McDiarmid, 1989). Let Yf,..., X„ be a martingale difference sequence 
with —pi < Yi < 1 — Pi, for i = 1,... ,n and suitable constants pi G (0,1). Set 
p = \ Pi- Then, for any real t such that f G (0,1— p), have 


P 


n 

Yi > nt 

. ^ = 1 


< inf , 

h>0 
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where Bn,p is a binomial random variable of parameters n and p. Furthermore, 

{ / \p+t / _ \ 1—p— 

VP + V vl“P“v I 

and the following foolproof version holds true: 

Hm{n,p,t) < exp (— 2 nt^) . 


See McDiarmid ||27l . Theorem 6.1, for a proof of this result. Let us remark that the 
function Hm{n,p, t) is related to the Hoeftding function; in fact, given n,p, t as in 
Theorem 11.91 we have Hm{n, p, t) = H{n, p, nt). Using Theorem ll.4l we deduce the 
following refined version of the previous result. 


Theorem 1.10. Let Yi,... ,Yn be a martingale difference sequence with —pi < Yi < 
1 — Pi, for i = 1,... ,n and suitable constants Pi G (0,1). Set p = L Yff=iPi- het T the 
set consisting of all functions / : M —)■ [0, +cxo) that are increasing and convex. Then, 
for any real t such that t G ( 0,1 — p), have 


P 


Yi > nt 


2=1 


< inf 


feT f(nt + np) 


E[/ (i?n,p)], 


where is a binomial random variable of parameters n and p. Furthermore, ifn{p + t) 

is a positive integer and t satisfies < t < I — p, we have 

+ (l - ^ = n{p + t)], 

where 

Tin, p,t):= YI [B„,, = j], 

j<n{p+t)-l 

Hm{n,p, t) is the function defined in Theorem W^ and h is the positive real satisfying 

^-hn(t+p) £ [^hBn.pl _ ^-{snt+snp) jg \^sBn,p~\ _ 

s>0 

The bound is less than the bound ofTheorem \1.9\ 
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Let us prove the last statement of the previous result. To this end, notice that the 
bound of the second statement is 

< + (^1 - ^[Bn,p = n{p + t)] := Q. 

Now Hoeffding's Theorem [IT] implies that 

P [Bn,p = n{p + t)] < P [Bn,p > n{p + f)] < Hm{n,p, t). 

Since Q is a convex combination of P [Bn,p = n{p + f)] and Hm{n,p, t), it follows 
that the bound of the previous result is less than the bound of Theorem 11.91 The 
proofs of the remaining statements of Theorem 11.101 can be found in Section jS] 
Our approach uses Theorem 11.41 combined with extensions of ideas that we em¬ 
ployed in previous work (see Il29l i. In the next subsection we apply Theorem 11.41 
to another class of weakly dependent random variables. 


1.3 k-wise independence 

In this section we employ Theorem ll.4l in order to obtain a concentration inequal¬ 
ity for a particular class of weakly dependent random variables. We begin by first 
defining this notion of weak dependence. The random variables Xi ,will 
be called k-wise independent if for any subset of k indices A = {ii,... ,ik} and all 
outcomes ,..., we have 

P[^ii < Xi,n---nXi^ < XjJ = JJ P < Xij] ■ 

ij&A 

K-wise independent random variables play a key role in theoretical computer 
science where they are used for de-randomizing algorithms (see jUj]). Note that 2- 
independent random variables are just pairwise independent random variables. 
Let us also mention two examples of (n — l)-wise random variables. Lef G a graph 
on n vertices. Suppose that each edge of G is given a random orientation with 
probability 1/2 for each direction, independently of all other edges. For every 
V e G, let = deg“(r’) mod 2 , where deg”(x) is the in-degree of vertex v. Then 
(see 1301 , Theorem 4) the random variables 6y,v eV are {n — l)-wise independent. 
Similarly, let G be a random graph from G{n, 1/2) and for every vertex v E G, set 
dy = deg(r’) mod 2. Then (see 1281 , Corollary 4.2) the random variables e V 
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are (n — l)-wise independent. For more sophisticated examples on k-wise inde¬ 
pendent random variables we refer the reader to Alon et al. ||T]] and Benjamini et 
al. IS. 


We shall be interested in concentration inequalities for sums of fc-wise indepen¬ 
dent random variables. The problem of obtaining analogues of Hoeffding's The¬ 
orem 11.11 for sums of fc-wise independent random variables has attracted the at¬ 
tention of several authors. See for example the works of Bellare et al. |j4|] and 
Schmidt et al. ||33ll and references therein. Among the several existing concentra¬ 
tion inequalities the following one is obtained via an approach that is similar to 
the approach of this paper. 

Theorem 1.11 (Schmidt, Siegel, Srinivasan, 1995). Let Ai,..., A„ be random vari¬ 
ables such that 0 < Aj < 1 and E [Aj] = Pi,for each i = 1,. .., n. Set p = ^ X]* A- 
e > 0 and set := \. If k > k^, and Xi,..., Xn are k-wise independent then 



See |33| for a proof of this result, a basic ingredient of which is the use of elemen¬ 
tary symmetric functions defined as Sj{Xi,... ,Xn) = where 

i < n. Clearly, the expectation of the function S'fc(Ai ,... ,Xn) is related to the 
definition of the random variable Z in Theorem ll.4[ In particular, the later result 
yields the following analogue of Hoeffding's Theorem 11.11 for sums of k-wise in¬ 
dependent random variables. 


Theorem 1.12. Let Ai,..., A„ be k-wise independent random variables such that 0 < 
Aj < 1 and E [A,] = p,for each z = 1,..., n. Fix e > 0. Then 


P 


n 

^ Aj > np{l + e) 
. ^=1 


< 


1 

{p — p‘^ 


_ Q-r>.D(p(l+e)\\p) 

n—k 


Notice that the previous result reduces to Hoeffding's Theorem ll.il when k = n, 
i.e. the random variables are mutually independent. Notice also that Theorem 
II. 121 is usefull for values of p that are close to ^ and rather large values of k. As a 
direct application of Theorem II.31 one obtains the following, special case, of Theo¬ 
rem [TTT] 
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Theorem 1.13. Fix p G (0,1) and let Xi,..., Xn be k-wise independent Bernoulli 0/1 
random variables such that E [Xi] = p, for each i = 1,... ,n. Let e > 0 be such that 
np{l + e) is a positive integer that satisfies np{l + e) > k. Then 



The proof of this result is immediate and so is omitted. In the next subsection we 
shall be concerned with a particular dependency structure between Bernoulli 0/1 
random variables. 


1.4 Dependency graphs 

In this section we discuss yet another application of Theorem [LH We shall be con¬ 
cerned with sums of dependent Bernoulli random variables whose dependency 
structure is given in terms of a finite graph. Such a graph is referred to as a depen¬ 
dency graph and is defined in the following Theorem. Dependency graphs are used 
in probabilistic combinatorics in order to prove existence of "structures" with cer¬ 
tain desired properties; a celebrated tool for proving such existence is the so-called 
Lovasz Local Lemma (see |9l). Below we obtain a concentration bound regarding 
sums of Bernoulli random variables whose dependency structure is given in terms 
of a finite graph. Recall that the independence number of a finite graph is the cardi¬ 
nality of the largest set of vertices no two of which are adjacent. 


Theorem 1.14. Let G = (V, E) be a finite graph with vertices vi,... ,Vn and let a be 
its independence number. To each Vi,i = I,... ,n we associate a Bernoulli 0/1 random 
variable, Bi, such that ¥[Bi = 1] = i. Suppose that each random variable Sj, i = 1 ,..., n 
is independent of the set {Bj : {vt, vf f E}. If t is a real number such that ^ < t < n, 
then 


P 


Y.Bi>t 




where iL(n, 1/2, t) is the Hoeffding function, defined in Theorem \Tl] 


Notice that the previous result reduces to Hoeffding's in case a = n, i.e., the ran¬ 
dom variables are independent. Notice also the the result is useful for rather large 
values of a. In the following section we discuss an improvement upon a concen¬ 
tration inequality for [/-statistics. 
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1.5 [/-statistics 


In this subsection we discuss an analogue of Hoeffding's Theorem 11.11 for a par¬ 
ticular class of weakly dependent random variables. Before being more precise, 
let us fix some notation. We will denote by w< the set consisting of all ordered 
d-tuples from the set [n]; formally, 

N< = {(h> ^ [n]‘^ <id<n}. 

There are several instances in which one encounters sums of random variables of 
the form 

^ ^ ^ ([*1 5 ) ■ ■ ■ 5 [id) 5 

(il,...,id)e[n]^ 

where [i,... are independent and identically distributed random variables and 
F : —)■ [ 0 , 1 ] is a bounded function that depends only on the random vector 

([ii 5 [i 2 • • ■ 5 [id)- Clearly, provided d > 1, the random variable H is a sum of de¬ 
pendent random variables. Such sums of random variables have been studied 
by several authors and are referred to as U-statistics. Again, as an example of 
t/-statistics, the reader may think of the number of triangles in an Erdos-Renyi 
random graph, G, on m vertices. In this case d = 3 and n = (™). Note that every 
triplet of vertices from G uniquely determines a triplet of potential edges. Hence 
we can set each i = 1,..., n to be a Bernoulli random variable of parameter p 
corresponding to the potential edges in G and F , [jj, [jg) to be the indicator that 
the three potential edges, corresponding to a triplet of vertices, are all present in 
G thus forming a triangle. 

U-statistics is a class of unbiased estimators, introduced by Hoeffding |T6l, that 
has attracted considerable attention; see for example the works of Arcones [2l, 
Gine et al. Ifl^ , Hoeffding [[16]], Janson 1201 , Joly et al. [[211 , just to name a few 
references. Let us bring to the reader's attention the following concentration in¬ 
equality on U-statistics, which is due to Hoeffding (see ITSl , Section 5; see also 
Janson 1201 , Section 4). In order to avoid dealing with any rounding issues, we 
state the result for the case in which d divides n, i.e. n = k-d, for some k G { 1 , 2 ,...}. 

Theorem 1.15 (Hoeffding, 1963). Let d, n be positive integers such that d divides n, i.e. 
n = k ■ d,for some positive integer k. Suppose that X is a random variable that can be 
written in the form 

F(6,. 
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where ^i,... ,^n independent and identically distributed random variables and 
F : —)■ [0,1] is a bounded function. Set p ;= ..., := Qzl)- If 

y = E[X] + t(fff,for some t e (0,1 — p), then 

P [^ > y] < inf E , 

where is a binomial Bin{k,p) random variable. Furthermore, 

inf e~’^y E [e^‘*'^''’p] < exp (—2H^) . 

We remark that a similar statement holds true for sums of independent random 
variables and has been the content of prior work (see lESl). By exploiting convex¬ 
ity, combined with similar ideas as in Section [L2l we deduce the following refined 
version of the previous theorem. 

Theorem 1.16. Let d, n be positive integers such that d divides n, i.e. n = k ■ d,for some 
positive integer k. Suppose that X is a random variable that can be written in the form 

.Y= FK.,.■■■.&). 

where are independent and identically distributed random variables and 

F : ^ [0,1] is a bounded function. Set p := E[F(^i^,..., ^j^)], := and 

denote by F the set consisting of all functions / : M —)■ [0, -fcxo) that are increasing and 
convex. Ify = E[X] t[fj,for some t e (0,1 — p), then 

P [X > I/] < irf y^E [/ {Nd ■ Bk,p)] , 

where Bk,p is a binomial Bin{k,p) random variable. Moreover, ift belongs to the interval 
, 1 — pj and k{p + t) is a positive integer/rom the interval {kp, k), we have 

inf [/ (iV, . Bt,,)] < . (exp (-IkC) - T,(k, p, ()) 

where 

k{p+t) — l 

T,{k,p,t):= e^^^‘^^-y^¥[Bk,p = j] 

j=0 
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and h is the positive real satisfying 

e-hy ^ rgftSfc.pl ^ ^-(sy ^ fg.Bfc.pl _ 

L ^ s>0 ^ 

The later bound is strictly less than the bound ofTheorem \1.15\ 

In other words, the previous result improves upon Theorem ll.lSI by adjusting a 
"missing factor" that is equal to < 1 - We prove this result in Section [6l 
The following five section are devoted to the proofs of the statements we dis¬ 
cussed so far. Finally, in Section [71 we present some applications to the theory of 
random graphs. 


2 Covariance estimates 

2.1 Proofs of Theorems 11.51 and 11.41 

We begin with the following Lemma in which we collect some properties of the 
random variables Za, defined in Theorem II.51 Recall that dfn] denotes the family 
consisting of all subsets of [n] of cardinality j e {0,1 ,..., n}. 

Lemma 2.1. Fix a positive integer n and let {xi,..., be real numbers from the inter¬ 

val [0,1]. For every A C [n] let (a be defined as 

ca= n 

i£A n]\A 


Then 

n n n 

= Ca = I and = ^ Ca- 

AC[n] 3=0 A£dj[n\ *=1 j=0 A£dj[n] 

Proof. The proofs of both statements are by induction on n. The first statement is 
clearly true for n = 1. Assuming that it holds true for n — 1, we prove it for n > 1. 
Given a set i? C [n — 1], we define Cs = HiGB Xi nje[n-i]\B(l “ ^0- Now notice that 
we can write 


n 


n 


n—1 


j=0 AGdj[n] j=^ n(^A(^dj[n] 3=0 n^A&dj[n] 
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where summation over n ^ A E dj[n] means that the sum runs over those A E 
djl n] that do not contain n; similarly summation over n E A E dj [n] means that the 
sum runs over A that contain n. Now each term (a in the first sum on the right 
hand side is multiplied by and each term in the second sum is multiplied by 
1 — Xn- This implies that 

i Z + E E c. = x„i: 5: & + (i-.jE E Ci. 

j=l n£A£dj[n] j=0 nfA£dj[n] j=0 B£dj[n—1] j=0 B£dj[n—1] 

The inductional hypothesis finishes the proof of the first statement. The proof of 
the second statement is similar. It is clearly true for n = 1; assuming that it holds 
true for n — 1 we prove it for n > 1. Notice that we can write 

n n—1 n 

A = a + A- 

j=0 A^dj[n] j=0 n^A^dj[n] j=l nsAs9j[n] 

The inductional hypothesis implies that the first addend in the right hand side of 
the last equation can be written as 


n—1 n—1 n—1 

Ca = Cb = 

j=0 n^A£dj[n] j=0 B£dj[n—1] *=1 

The second addend can be written as 


i=l n€A€dj[n] 


n-1 

Xn'^U + l) 

j=0 B£dj[n—1] 

n—1 n—1 

j=0 Bedjln—1] j=0 B&dj[n—l] 

n—1 

Xn ^ ^ Xi -\~ Xji, 
i=l 


where the last equality comes from the inductional hypothesis and the first state¬ 
ment. Adding up the expressions in the last two equations yields the result. □ 


A basic ingredient in the proofs of most results in this paper is Theorem ll.4l which 
we are now in position to prove. 
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Proof of Theorem [I~4l The claim that ~ follows from the previous 

lemma. Fix a function / G J^icx- Since / is non-negative and increasing, Markov's 
inequality yields 


P 




2=1 


< E 

- fit) 


/ 


, 2=1 


Now Lemma |2T] implies that 


i=l j=0 A^dAn] 


and so, since X], 4 c[n] follows that Xi is a convex combination of the 

set of integers {0,1,..., n}. Since / is convex, we conclude 


( n \ n 

E-’f- sE E 

*=1 / j=0 A&dj[n] 

which, in turn, implies that 


E 



<E E K^A-fU)- 

j=0 A€dj[n] 


The result follows. 


□ 


We can now proceed with the proof of Theorem II.51 

Proof of Theorem 1131 Since f{x) = e’^^,h,x > 0 is a convex, increasing and non¬ 
negative function. Theorem 11.41 and the hypothesized estimate on E[Za] yield 


Y.x,>t 


2=1 


< '-"E E ^1^4 


ohj 


j=0 AGdj[n] 


j=0 




= e-'^'(5 + 7 e'^)”, for/i>0. 
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where the last equality follows from the binomial theorem. If we minimise the last 
expression with respect to h > 0, we get Therefore 


P 




< 7*5”-* 



n 

n — t 


n 


and the first statement follows. Since t = n'y + n'ye, we can wrife the right hand 
side of the last inequality as 


t ^n—t 




n 


n 


n — t 


7 


y+7^ / ^ \ 

/ VI-7-7^/ 


.7 + 7^ 


1—7—7e 


which in turn is equal to 
result. 


7 


\ 7+7£ 


7+7£ 


1-7 

1—7—7£ 


1—7—7£ 


S 

1-7 


1—7—7£ 


and proves the 

□ 


Likewise, Theorem 11.31 can be obtained from Theorem II.41 by suitably choosing a 
function / G 


Proof of Theorem [L3l We apply Theorem 11.41 to a suitably chosen function. Given 
positive integer k such that 0 < fc < (in, define the sequence {am}m=o setting 
Om = 0, for m E {0,1,..., fc — 1} and = (™), for m E {fc, A; + 1,..., n}. Now 
lef g{x) fo be the function defined by setting g{m) = a^, for m E {0,1,..., n} 
and g{-) is linear between consecutive values, g{m),g{m + 1). It is easy to see, 
by comparing slopes, that the term a^, m > /c, is to the right of the line passing 
through the points Om-i and Om+i- This implies that g{x) is convex, increasing and 
non-negative and so Theorem II.41 yields 


P 


y^^Xi> /3n 


1 

\ k ) j=k A&dj[n] 




E [Za] . 


Since the random variables Xi ,are indicators, the result follows upon ob¬ 
serving that 



E E eiZt|= E e n^'- 

A-.\A\=kT-.ACT A:\A\=k lieA 


□ 
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We proceed with yet another proof of the previous result and a corresponding 
lower bound. 


Proof of Theorem [ITTI Given an outcome of the random variables Xi,..., define 
Hk to be the random variable that counts the number of indices i G {1 ,..., n} for 
which Xi = 1 in k draws without replacement from the set of indices {1 ,..., n}. 
Notice that Hk is a mixture of hypergeometric distributions. Now we can write 


n 

F[Hk = k] = 5^P 

Hk-- 

= k 

_1 

■ P 

_1 

j=k 



i=l 


_i=l 

IV 

Hk 

= k 

n 

Ev=j 

i=l 

• P 

-1 


n n 

j=^n Vfc/ 


y^^Xi>Pn 

. ^=1 


> 


(t) 

0 


• p 


For T C [n] set, as usual, Zt = IlieT nje[n]\T(l ~ The upper bound follows 
upon observing thaf 


F[Hk = k] = 


j=k 


■P 




= J 


i=l 




k 

j=k '' ' T:\T\=j 


E E eiZt|= E e 


A:\A\=k T-.AQT 


A-.\A\=k L*6A 


Ux, 


We now proceed with the proof of the lower bound. Given an outcome of the ran¬ 
dom variables Xi,, X^, define Hjsn to be the random variable that counts the 
number of indices i G {1 ,..., n} for which Xj = 1 in (3n draws without replace¬ 
ment from the set of indices {1,..., n}. Notice that 


P 


y^^Xi>/3n 

. ^ = 1 


> P [HjSn 




and so it is enough to estimate P [H^n = fn] from below. Now, using a similar 
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computation as before, we can write 


¥[H0^ = l3n] 


n ( j\ 

E \0n) 

TITT 

j=^n V/972/ 


•P 





E 

j=/3n 



E 

T:\T\=j 


E E n-’f. 

■PnJ A-.\A\=lin [.i^A 


The result follows. 


□ 


We end this section with the proof of Theorem II.81 The proof will require the fol¬ 
lowing, classical, result. 


Theorem 2.2 (Hoeffding, 1956). Let Bi,... ,Bnbe independent Bernoulli 0/1 random 
variables whose mean equals pi,... ,p„, respectively. Then 


P 


Y.B,>h 
. *=1 


> P [Bn,p > b] , 


when 0 < 6 < np, p = \ Pi Bn,p is a binomial distribution of parameters n and 

p. 


Proof. See |[T7|, Theorem 4. 


□ 


The following proof is similar to the proof of Theorem 3.1 form 111911 . See also 
Theorem 2 in Siegel [Ml . 


Proof of Theorem \L8\ We may assume that P [X), Xi >t] > 0. For every outcome of 
the random variables Xi,..., X„, let Bi be a Bernoulli Ber(Xi), z = 1,. .., n, ran¬ 
dom variable. That is, given an outcome of X*, z = 1 ,..., ?z, we toss n independent 
0/1 coins such that the z-th coin lands on 1 with probability Xj. Notice that, given 
Xi,..., Xn, the mean of XlILi equals Xi and so E ^i\ ^ “^P- Further¬ 
more, given Xi,..., Xn, define Bn to be a binomial distribution of parameters n 
and i X]r=i ^ii thus 'E,[Bn] = np as well. Since P [X]r=i Xi>t]>0 we can write 


P 


Y,B,>t 
. ^=1 


J2x,>t 


i=l 


^ P E".i-B, > t -1] 
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and so it is enough to estimate the probability on the left hand side from below. 
Now, given that ^ the mean of greater than or equal to t and 

so Theorem 12.21 implies that 


P 


2=1 


Bi > t 


2=1 


Xi > t 


> P 


Bn >t 


n 


2=1 


Xi > t 


It is well-known (see Kaas et al. ||22|) that a median of a binomial distribution of 
parameters n and p is greater than or equal io np — 1. This implies that, given 
^ t, the probability that Bn is greater than or equal to f — 1 is at least 
Summarising, we have shown 


P 




2=1 


< 2-P 


>f-l 


2=1 


and, since we assume f — 1 > np, we may apply Hoeffding's Theorem 11.11 to 
^ [Z]r=i Bi>t — 1] and conclude the result. □ 


2.2 Independent random variables - Proof of Theorem 11.11 


In this section we show that our main result can be seen as a generalisation of 
Hoeffding's theorem. In particular we provide, as a consequence of Theorem 11.41 
a proof of Theorem ll.il Notice that, in case the random variables Xi,... ,Xn are 
independent, we have 


A^dj [n] 


H-’f. n 

iGA 2G n]\A 


E n (i-Eiv]) 

A£dj[n] Y2€A 2E n]\A 

F[H{pi,...,Pn) = j], 


where Pi = E[Xi] , i = 1 ,..., n and iT(pi,..., p„) is the random variable that counts 
the number of successes in n independent trials where, for i = 1,... ,n, the f-th 
trial has probability of success pi. The proof of Theorem II.II will be based on the 
following, well-known, resulf. 


Theorem 2.3 (Hoeffding, 1956). Let H{pi,..., pn) he the random variable defined above 
and set p = 1 Pi- / : M —)■ M a convex function. Then 

E[f {H{p,,...,Pn))]<E[f {Bn,p)], 

where Bn^p is a binomial random variable of parameters n and p. 
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Proof. See lUTIl . Theorem 3. 

We can now provide yet another proof of Hoeffding's resulf. 


□ 


Proof of Theorem HTH The proof is similar fo the proof of Theorem 11.51 We have 
already seen fhaf 

E [Za] = P ...,Pn)=j]. 

Aedj [n] 

Therefore, Theorems II.41 and 12.31 together with the binomial theorem yield 


P 


5^2^, >f 

i=l 


i=o 


j=o 

= e-^*(l-p + pe^)”, for/i>0 . 

The result follows upon minimising the last expression with respect to > 0. 


□ 


3 Martingales - Proof of Theorem ILIOI 

In this section we prove Theorem II .101 Before doing so, we need fo be able fo esti¬ 
mate expecfafions of producfs of certain martingale difference sequances. This is 
the content of the following resulf. 

Lemma 3.1. Let Yi, ... ,Ynbe a martingale difference sequence with —pi <Yi <1 — pi, 
for f = 1,..., n and suitable constants pi G (0,1). Fix a subset A C [n]. Then 


E 


-1 

+ 

\ 

1 

1 

T - 1 

Sll!’* ” 

i&A i£ 

n]\A 

i&A i£ 


(1 - Pi)- 

n]\A 


Proof The proof is by induction on n. For n = 1 the statement is clearly true. 
Assuming that it holds true for n — 1, we prove if for n. Lef A be a subset of [n]. 
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There are two case to consider. Either n G A or n ^ A. In the first case, the tower 
property of conditional expectations yields 


E 


\{iy^+vi) n 

i^A lE 


E 

(Yn+Pn) n n i^-P^-Yi} 

*^n—l 



iGA-,i^n iG[n—1]\A 




where An-i denotes the cr-algebra generated by the random variables Ti,..., E'n-i- 
Since Fi,..., is a martingale difference sequence it follows that the latter quan¬ 
tity equals 


Pn E 

n n i^-Pi-^i) 

slip- 


i£A]i^n 2G[?2—1]\A 

iGA i£ n 


where the inequality follows from the inductional hypothesis. The second case is 
proven similarly and so is left to the reader. □ 

We are now ready to prove the main result of this section. 

Proof of Theorem \1.10\ Fix /(■) E T. Since /(■) is non-negative and increasing, 
Markov's inequality implies 


P 


Yi > nt 


2=1 


= P 


< 


Yi + np > nt + np 

. 2=1 

1 


f{nt -|- np) 


E 




. i=l 


Since YA=i0^i + Pi) ^ l]-valued random variables and / is convex, we 
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apply Theorem II.41 and Lemma l3d] to conclude 


E 


/ E«+p-) 


, i=l 


£ E E E 

j=0 A&dj[n\ 


X{(y^+Pi) n 

i^A 


/(j) 


j=0 A^dj[n] iGA ig 

s E(")p’(i-pr''/w. 

j=o 


where the last inequality follows from Theorem 12.31 The first statement follows. 
In order to prove the second statement, and for the sake of completeness, let us 
first prove McDiarmid's exponential bound. Define fhix) = for h > 0. Then 
we know from the first statement that 


P 


Yi > nt 


i=l 


< e 


—hn{t-\-p) 


^2 - pT~^e^^ = (1 - P + pe^' 

j=o 


If we now minimise the last expression with respect to /i > 0, we get that h must 
be such that so 


P 


n 

Yi > nt 




where Hm{n,p, t) is the function defined in Theorem II.91 The bound Hm,{n,p, t) < 
exp (—2f^n) follows by employing standard estimates on the Kullback-Leibler dis¬ 
tance. We now proceed with the second statement of the Theorem. In order to 
simplify the notation, let us set £ := n{p -|- f); recall that we assume £ is a positive 
integer. Consider the function Phix) = max{0, h{x — t) + 1}, for the particular 
value h obtained by minimising with respect to s > 0. The first 

statement implies that 


P 


n 

Yi> nt 

. ^ = 1 


s E r)p’{i-pr-HhU-i!) + i) = naAB..p)]- 

j>e-i ^ ^ 
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Now we can write 

+ E (’')p‘(I-{) + !)}. 

Notice that the assumption t > implies that h > 1 and therefore £ — ^ G 

{i — 1,£). The assumption that £ is a positive integer, i.e. a possible value of j, 
implies that, for j = i, the second term in the right hand side of the last equation 
evaluates to 0. The last two observations imply that 

e-i , . 

i=o ' 

j=i+\ ' 


Now the fact that the function is decreasing for x > 1 implies 1 — > 

1 — for X > 1 and so we can estimate 


Mi-^) _ 


ilrU - f) + 1) = (l - (l - i^) 


for j > £ + 1. This implies 


H^{n,p,t)-E[gh{Bn,p)] > {I - pT 

i<£-l 


+ 1 - 


1 + h 


j>e+i 


E ( 




or, equivalently, that 


E[g»(B«,,)l < ^ 


i<^-i 

and the second statement of Theorem 11.101 follows. The third statement has been 
proven in Section [L2l and so the result follows. □ 


+ 1 - 
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In other words, the previous result improves upon McDiarmid's by adding a 
"missing factor" that is equal to ^ < 1. 


4 k-wise independence - Proof of Theorem 11.121 


This section is devoted to the proof of Theorem ll.l2[ 
Proof of Theorem \1.12\ Markov's inequality implies that 


P 


Xi > np{l + 


i=l 


< Q-hnp{l+e)^ 




From Theorem 11.41 we know that 


E 


n 




Ha- ] 

J (l-A'i) 

j=0 A&dj[n] 

i&A i£ 

n]\A 


:= Q. 


Fix a subset A C [n] such that l^l < k. Let B be any subset of [n] \ A such that 
\B\ = k — \A\. Since A and B are disjoint and the random variables are [0, l]-valued 
and fc-wise independent it follows 


E 


jja. n (i-v) 

i£A 


< E 




.ieA ieB 




Now fix a subset A C [n] such that | A| > k. Let A' be any subset of A of cardinality 
k. Then 


E 


Ha. n (1-A.) 

iGA 


< E 


n-A. 

.ieA' 


= p 
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The last two estimates yield 


Q < 


j=0 


ECVll-E ('np‘e'‘" 


j=k+l 


< 




n-j^hj 


1 


\n—k 


< 


(p(l \3 


j=k+l 

1 / n 


n-j^hj 


{p{l-p)Y-^ ^ \J 


Y (■ jp’i^-p) 


n-j^hj 


{p{l-p)) 


-{l-p + pe^)^, 


where the last equation follows from the binomial theorem. Summarising, we 
have shown 


P 


Y^i^ np{l + e) 


2=1 


< 


^—hnp{l+e) 

{pii-p)y 


-{l-p + pe’^y 


and the result follows upon minimising the last expression with respect to h 

0 . 


5 Dependency graphs - Proof of Theorem 11.141 


In this section we prove Theorem ll.l4[ The proof is similar to the proof of Theorem 

[LT21 

Proof of Theorem B . 1 41 From Theorem 11.41 we can infer that 


n 

n 





] 

J (l-Bi) 


i=0 AGdj[n] 

i&A i£ 

n]\yl 


Fix a subset 14, C V, of cardinality a, such that no two vertices of 14 are adjacent 
and let 4 = {4, • • •, 4} be the indices of the vertices that belong to 14- For every 


27 


□ V 
























A C [n], let us denote La = A D la and Ra = ([n] \ A) fi Then, for all j G 
{0,1,..., n} and all A G dj[n], we have 


\{b, J] (1 -s,) < J] 5, J] (1 - Bi). 



Since the random variables 5,^., k = 1,..., a are mutually independent, we con¬ 
clude 



which, in turn, implies that 



The result follows upon minimising the last expression with respect to > 0. □ 


6 U -statistics - Proof of Theorem 11.161 


This section is devoted to the proof of Theorem 11.161 The proof combines simi¬ 
lar ideas as above together with an adaptation of the proof of Theorem 2.1 from 
Janson IHOB (see also Hoeffding KTSl , Section 5). The main idea is to express X as 
a weighted sum '^j^j iri such a way that each random variable Yi is a sum of 
independent random variables. 

Proof of Theorem \T?T6\ Since ..., are independent and identically distributed, 
it follows that the random variables F {^i-^ ,..., for (h ,..., G [n\f, are identi¬ 

cally distributed. Let p be the expected value of the random variable F{^i^ ,..., ^if). 
Recall that we assume that d divides n, i.e. n = k ■ d, for some positive integer k. 
Let V be the set of all partitions of [n] into k subsets of cardinality d. We first need 
to know the proportion of partitions P eV that contain a fixed d-set. To this end, 
notice that by symmetry each of the (”) choices for a d-set belongs to the same 
number, say a, of partitions in the class V. Furthermore, each element from V 
contains k sets of cardinality d. Therefore, 
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where \V\ denotes the cardinality of V. This implies that we can write 

•’f = E ■■■.&) 

P&V 

Notice that each term Xp := ..., is a sum of independent and 

identically distributed, {0, l}-valued random variables, . .., {h,..., G 

P, whose mean equals p or, in other words, Xp is a binomial random variable of 
parameters k and p. Furthermore, notice that E[X] = 

Fix a function /(■) G P and set y := E[X] + := Markov's inequality 

and the assumption that / is non-negative and increasing imply 

V[X>v\< |/(A')]. 

The assumption that /(■) is convex yields 


E [/ (X)] 


< 


E 


/ 



PG-p ' ' 

E[/ (X,-i?fcp)], 


where the last equation follows from the fact that each Xp is a binomial random 
variable of paramefers k and p. The first statement follows. For the sake of com¬ 
pleteness, we proceed by proving the exponential bound in Theorem I1.15[ Let h 
be a positive real, to be chosen later, and consider the function fh{x) = x G M. 
Clearly, fh^P and the first statement together with the binomial theorem yield 


P[X>2/] < 

j=o 

= (l — p -I- pe^^'^Y , for L > 0. 

If we minimise the last expression with respect to L > 0, we get that h must satisfy 
= YYi-p-t) ' Substituting this into the last expression and recalling that y = 
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kNdip + 1) gives 


F[X>y] < 


< 



^-kD{p+t\\p) 


P 


1 — P — t 


1—p — t 


k 


where the last inequality follows from the standard estimate D{p + t\\p) > 2t^ 
on the Kullback-Leibler distance. We now prove the second statement. Let h 
be such that gh{x),x G M, be the function defined by 

Qhix) = max{0, h{NdX — y) + 1}. The first statement implies 


V[X>y]< Y. (''X(l-pf-Hh(N,j-y) + l)^¥.[g^(NA.r)\, 


•I— hN, 


where Bk^p is a binomial random variable of parameters k and p. Let us denote 
Hu{k,p,t) = Yl’j=o {f)P^~ Recall that y = kNd{p + t) and notice 

that we can write 

H^{k,p, f) - E [gk {NdBk,p)] = Y. 

3<k{p+t)-j;^ ^ 

+ E - {h{Ndj -y) + 1)) . 

j>k{p+t)—f^ ^ 


Since t > follows that hNd > 1 and therefore k{p + t) — belongs to 

the interval {k{p + t) — 1, k{p + t)). As k{p + 1) is assumed to be an integer we can 
rewrite the last equation as 


H^{k,p, t) - E [gk {NdBk,p)] = Y 

j<k(p+t)-i 

j>k(p+t) ' 


Notice that for j = k{p + t) the second term in the right hand side evaluates to 0. 
Since the function is decreasing for x > 1 we can estimate, for every potive 
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integer j such that j > k{p + t) + 1, 




> 1 


_ 1 ^ ^h{Nd-y) 

phNd 


and this implies that 

Hu{k,p, t) - E [gh (NdBk^p)] > 


E 

j<k(p+t)-l 




+ 1 


hN^ + 1 

^hNd 


E 

j>k{p+t)+l 


. W(1 — p) 


k-j „h{N^j-y) 


or, equivalently, that 


¥.[gh{NdBk,p)] < ^ Hu{k,p, t) + (l - ^ ) P [Bk,p = j] 


k{p+t)—l 


h^d + l ^hiNaj- 

3=0 


ohNa 


',P j\ 


and the second statement follows. To prove the third stetement, note that the 
previous bound is 

^ ^ (-2^^') + (l - ^ •= ^• 

Now, Hoeffding's Theorem II. ll implies that 

P [Bk^p = k{p + t)] < P [Bk^p > k{p + t)] < exp (— 

and the third statement follows from the fact that Q is a convex combination of 

exp (—2H^) and P = A;(p + f)]. □ 


7 Some applications 


In this section we discuss some applications of Theorem 11.21 and Theorem 11.31 to 
the theory of random graphs. Recall (see IIT91 , Theorem 3.1) that in this case the 
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constant c of Theorem ll.2l is qual to 1. We employ this result in order to obtain con¬ 
centration inequalities for particular sums of weakly dependent indicators that are 
encountered in the theory of Erdos-Renyi random graphs. Let us mention that we 
do not intend to provide optimal concentration bounds; our intention is to empha¬ 
size that Theorems II.211131 combined with some elementary combinatorial result, 
yields certain concentration inequalities in a rather direct and simple marmer. 
Notice that, in order to apply Theorem II.21 to a specific problem, one has to deter¬ 
mine the constant 7. In this section we find this constant for particular problems 
from the theory of Erdos-Renyi random graphs. Recall that such graphs, on n ver¬ 
tices, are constructed by joining pairs of labelled vertices with probability p G (0,1), 
independently of all other pairs. Let G E Q{n, p) be an Erdos-Renyi random graph 
and denote by In,p the number of isolated vertices in G. Recall that a vertex is 
called isolated is its degree equals zero. Below we provide a concentration inequal¬ 
ity for In,p. Lor shaper results on this problem we refer the reader to Ghosh et al. 

Cl. 


Proposition 7.1. Let G E G{n,p) be an Erdos-Renyi random graph. Let I^^p be the 
number of isolated vertices in G and fix a real number t such that n{l —p)G-L/‘^ <t <n 
and write t = n{l — pY^ e),for some e > 0. Then 

P[4,p >t]< e-’*^Wi+£)ll7)^ 

where 7 = (1 — p)G-L/‘^^ 


Proof Eor every vertex Vi,i = 1,. .., n, let J, be the indicator of the event "vi is 
isolated". Then In,p = E,[In,p\ = n{l — p)'^~^. Let A C [n] be a set of 

cardinality j E {0,1,..., n}. Theorem 11.21 requires to find constant 7 such that 
E [n.,^ li] < 7T We claim that we may choose 7 = {1 — p) A i)/^; the result then 
follows from Theorem 11.21 To prove the claim, notice that the expression on the 
left hand side of the previous inequality equals the probability that the vertices 

Vi,i E A are isolated, which happens with probability (1 — Hence 


E 
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{I - p)i^+An-j) < 


as required. □ 

We now proceed with a concentration inequality on yet another sum of depen¬ 
dent indicators. Let G E Q{n,p) be an Erdos-Renyi random graph and denote by 
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T„ p the number of triangles in G. The problem of obtaining upper bounds on the 
probability that T„^p is larger than its mean is classical; we refer the reader to the 
works of DeMarco et al. BH]], Janson II20I1 and Kim et al. |[24l for much sharper 
bounds and references. Using Theorem 11.21 one can obtain the following concen¬ 
tration bound on the number of triangles in a random graph. 

Proposition 7.2. Let G G G{n,p) be an Erdds-Renyi random graph and denote by Tn,p 
the number of triangles in G. Fix a real number t such that (3)1?^ < f < (3) ^nd write 
^ = ( 3 )^^ (1 + some c > 0. Then 

P[Tn,p >t\< e-(S)^(T'(l+^)llT'), 


where 7 = 

The proof of the previous result is based upon the following Mantel-type result. 

Lemma 7.3. Fix n > 3, set N = ( 3 ) and suppose that G = (V, E) is a graph on n 
vertices having j triangles, where j G For every triangle T*, z = 1 ,..., j 

in G, let Ei be the set consisting of the three edges that belong to R and set R = u{^^Ei. 
Then R contains at least edges. 

Proof Let |i?| denote the cardinality of R. Counf pairs (e, v), where e = (^ 1 ,^ 2 ) is 
an edge from R and f is a verfex from V \ {ui, ^ 2 }. Now, on one hand, the number 
of such pairs is at most \E\ ■ {n — 2). On the other hand, each triangle of G is 
counted exactly three times. Thus 

\R\-{n-2)>3j 


and the result follows. □ 

Proof of Proposition 17.21 Set N := ( 3 ). Let Ti,i = 1,..., be an enumeration of all 
potential triangles in G. Given a triangle R, let Ei denote the set consisting of the 
three edges that belong to Tj, z = 1,. .., iV. Define Xi to be the indicator of the 
event that triangle R is present in G. Then the number of triangles in G equals 
Tg = In order to apply Theorem II.21 we need to find an upper bound on 

E ^ ^ j ^ {O 5 • • • 5 Ff}. Let j G {0,1, ■ ■ ■, N} be such that 

there exists graphs on n vertices having j triangles and note that, for A G dj[N], 
we have 

E[Z,i] = P [Xi = 1, for ieA]= pf 
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where t denotes the cardinality of the set UjgA-Ej. Lemma [7.31 finishes the proof. 

□ 


Clearly, Proposition 17.21 is not very informative; the constant 7 is quire large. Per¬ 
haps more sophisticated versions of Lemma [7^ can provide smaller values of 7. 
Let us remark that Lemma |7^ may be iterated to produce bounds on the number 
of cliques in an Erdos-Renyi random graph. Let us illustrate this with the number 
of 4-cliques. Let G G Q{n,p) be an Erdos-Renyi random graph and denote by Qn,p 
the number of 4-cliques in G. We first provide a lower bound on 4-cliques in a 
graph in terms of triangles. 

Lemma 7.4. Fix n > 4, set = (”) and suppose that G = (V, E) is a graph on n 
vertices having k A-cliques, where fc G {0,1,..., N 4 }. For every A-clique Qi,i = 1,..., j 
in G, let Ti be the set consisting of the four triangles that belong to Qi and set R = 

Then R contains at least triangles and so, by Lemma U^ at least sdges. 

Proof Count pairs (T, v), where T is a triangle from R and r; is a vertex from V 
which is different from the vertices of the triangle T. The number of such pairs is 
at most |i?| ■ (n — 3) and each 4-clique is counted exactly 4 times. □ 

We can therefore conclude the following, rather crude, bound whose proof is sim¬ 
ilar to the proof of Proposition 17.21 and so is left to the reader. 


Proposition 7.5. Let G G G{n,p) be an Erdos-Renyi random graph and denote by Qn,p 

. , _ 12 _ , . 

the number of A-cliques in G. Fix a real number t such that t < Qj and 

write t = Qjp("-2)(n-3) (1 _|_ e),for some £ > 0. Then 


P [Qn,p 


>t]< e-(4)^C(i+£)ll7)^ 


where 7 = p("-2)("-3). 

Let us proceed with some applications of Theorem 11.31 to another model of ran¬ 
dom graphs, namely G(n,m). Recall that such a graph is obtained by selecting 
uniformly at random a graph, G, from the set of all labelled graphs on n vertices 
and m edges. We begin with a concentration bound on the number of isolated 
vertices in G. 
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Proposition 7.6. Let G e G{n, m) and denote by In,m the number of isolated vertices in 
G. Let tbe a positive integer. Then 


P \L 


> < min 

0<fc<t 



Proof. Let Jj be the indicator of the event "vertex i is isolated". Then ^ Jj. 
For fixed A C [n] of cardinality k we have E [HiGyi = P [-ti = 1; for aU t G A] and 
the later probability equals 


P [Li = 1, for i E A] 



Theorem II.31 finishes the proof. 


□ 


Off course, the previous bound is useful tor t such that t > /(”). Note that, 

for such t, the bound of the previous result reduces to Markov's inequality when 
k = I and so the minimum over k G — 1} provides a better bound than 

Markov's inequality. Our paper ends with a concentration bound on the number 
of triangles in G G G{n, m). 


Proposition 7.7. Let G G G(n, m) and denote by Tn^m the number of triangles in G. Let 
tbe a positive integer from the set {2,..., (g) }. Then 


P [Tn,m 


> tl < min 

0<k<t 




Proof. Set N = (”). Let Ti,i = 1,..., N be an enumeration of all potential triangles, 
let Vi be set consisting of the three vertices of triangle Ti and let Ei be the set 
consisting of the three edges of T*. Let J, be the indicator of the event "triangle i 
is present in G". Then T„ ^. T. Fix positive integer k such that 0 < A; < t. If 
A [iV] is a set of indices of cardinality k then E r] equals the probability 
that the triangles Ti,i G A are all present in G. The set of vertices 12,1/, and the set 
of edges UjE, induce a (potential) graph that has k triangles and so, by Lemma 
17.31 it has at least ^ edges. We can thus associate to each, non-empty, 

collection {T*}, of k triangles a set, E', consisting of edges in such a way that 
if the triangles {Tj}, are present in G then the edges from E' are also present in G. 
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This implies that 


P [Jj = 1, for i E A] < 



and the result follows from Theorem II.Si 


□ 
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